Time-Frequency Localization Using Deep Convolutional Maxout Neural Network in Persian Speech Recognition
نویسندگان
چکیده
In this paper, a CNN-based structure for the time-frequency localization of information is proposed Persian speech recognition. Research has shown that receptive fields’ spectrotemporal plasticity some neurons in mammals’ primary auditory cortex and midbrain makes facilities improve recognition performance. Over past few years, much work been done to localize ASR systems, using spatial or temporal immutability properties methods such as HMMs, TDNNs, CNNs, LSTM-RNNs. However, most these models have large parameter volumes are challenging train. For purpose, we presented called Time-Frequency Convolutional Maxout Neural Network (TFCMNN) which parallel time-domain frequency-domain 1D-CMNNs applied simultaneously independently spectrogram, then their outputs concatenated jointly fully connected network classification. To performance structure, used newly developed Dropout, maxout, weight normalization. Two sets experiments were designed implemented on FARSDAT dataset evaluate model compared conventional 1D-CMNN models. According experimental results, average score TFCMNN about 1.6% higher than addition, training time 17 h lower traditional Therefore, proven other sources, systems increases system accuracy speeds up process.
منابع مشابه
Convolutional deep maxout networks for phone recognition
Convolutional neural networks have recently been shown to outperform fully connected deep neural networks on several speech recognition tasks. Their superior performance is due to their convolutional structure that processes several, slightly shifted versions of the input window using the same weights, and then pools the resulting neural activations. This pooling operation makes the network les...
متن کاملEMG-based wrist gesture recognition using a convolutional neural network
Background: Deep learning has revolutionized artificial intelligence and has transformed many fields. It allows processing high-dimensional data (such as signals or images) without the need for feature engineering. The aim of this research is to develop a deep learning-based system to decode motor intent from electromyogram (EMG) signals. Methods: A myoelectric system based on convolutional ne...
متن کاملDeep Recurrent Convolutional Neural Network: Improving Performance For Speech Recognition
A deep learning approach has been widely applied in sequence modeling problems. In terms of automatic speech recognition (ASR), its performance has significantly been improved by increasing large speech corpus and deeper neural network. Especially, recurrent neural network and deep convolutional neural network have been applied in ASR successfully. Given the arising problem of training speed, w...
متن کاملPhone recognition with hierarchical convolutional deep maxout networks
Deep convolutional neural networks (CNNs) have recently been shown to outperform fully connected deep neural networks (DNNs) both on low-resource and on large-scale speech tasks. Experiments indicate that convolutional networks can attain a 10–15 % relative improvement in the word error rate of large vocabulary recognition tasks over fully connected deep networks. Here, we explore some refineme...
متن کاملImproving deep convolutional neural networks with mixed maxout units
Motivated by insights from the maxout-units-based deep Convolutional Neural Network (CNN) that "non-maximal features are unable to deliver" and "feature mapping subspace pooling is insufficient," we present a novel mixed variant of the recently introduced maxout unit called a mixout unit. Specifically, we do so by calculating the exponential probabilities of feature mappings gained by applying ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Neural Processing Letters
سال: 2022
ISSN: ['1573-773X', '1370-4621']
DOI: https://doi.org/10.1007/s11063-022-11006-1